{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "368686b4-f487-4dd4-aeff-37823976529d",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/multi_modal/dashscope_multi_modal.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "\n",
    "# Multi-Modal LLM using DashScope qwen-vl model for image reasoning\n",
    "\n",
    "In this notebook, we show how to use DashScope qwen-vl MultiModal LLM class/abstraction for image understanding/reasoning.\n",
    "Async is not currently supported\n",
    "\n",
    "We also show several functions we are now supporting for DashScope LLM:\n",
    "* `complete` (sync): for a single prompt and list of images\n",
    "* `chat` (sync): for multiple chat messages\n",
    "* `stream complete` (sync): for steaming output of complete\n",
    "* `stream chat` (sync): for steaming output of chat\n",
    "* multi round conversation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc691ca8",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -U dashscope"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4479bf64",
   "metadata": {},
   "source": [
    "##  Use DashScope to understand Images from URLs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5455d8c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set API key\n",
    "%env DASHSCOPE_API_KEY='YOUR_DASHSCOPE_API_KEY'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d0d083e",
   "metadata": {},
   "source": [
    "## Initialize `DashScopeMultiModal` and Load Images from URLs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8725b6d2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.multi_modal_llms import (\n",
    "    DashScopeMultiModal,\n",
    "    DashScopeMultiModalModels,\n",
    ")\n",
    "\n",
    "from llama_index.multi_modal_llms.generic_utils import (\n",
    "    load_image_urls,\n",
    ")\n",
    "\n",
    "\n",
    "image_urls = [\n",
    "    \"https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg\",\n",
    "]\n",
    "\n",
    "image_documents = load_image_urls(image_urls)\n",
    "\n",
    "dashscope_multi_modal_llm = DashScopeMultiModal(\n",
    "    model_name=DashScopeMultiModalModels.QWEN_VL_MAX,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbd9c116",
   "metadata": {},
   "source": [
    "### Complete a prompt with images"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c96ab53e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The image captures a serene moment on a sandy beach at sunset. A woman, dressed in a blue and white plaid shirt, is seated on the ground. She is holding a treat in her hand, which is being gently taken by a dog. The dog, wearing a blue harness, is sitting next to the woman, its paw resting on her leg. The backdrop of this heartwarming scene is the vast ocean, with the sun setting in the distance, casting a warm glow over the entire landscape. The image beautifully encapsulates the bond between the woman and her dog, set against the tranquil beauty of nature.\n"
     ]
    }
   ],
   "source": [
    "complete_response = dashscope_multi_modal_llm.complete(\n",
    "    prompt=\"What's in the image?\",\n",
    "    image_documents=image_documents,\n",
    ")\n",
    "print(complete_response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e043cc59",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There is a dog in Picture 1, and there is a panda in Picture 2.\n"
     ]
    }
   ],
   "source": [
    "### Complete a prompt with multi images\n",
    "multi_image_urls = [\n",
    "    \"https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg\",\n",
    "    \"https://dashscope.oss-cn-beijing.aliyuncs.com/images/panda.jpeg\",\n",
    "]\n",
    "\n",
    "multi_image_documents = load_image_urls(multi_image_urls)\n",
    "complete_response = dashscope_multi_modal_llm.complete(\n",
    "    prompt=\"What animals are in the pictures?\",\n",
    "    image_documents=multi_image_documents,\n",
    ")\n",
    "print(complete_response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26ff28b6",
   "metadata": {},
   "source": [
    "### Steam Complete a prompt with a bunch of images"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eab28aa6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The image captures a serene moment on a sandy beach at sunset. A woman, dressed in a blue and white plaid shirt, is seated on the ground. She is holding a treat in her hand, which is being gently taken by a dog. The dog, wearing a blue harness, is sitting next to the woman, its paw resting on her leg. The backdrop of this heartwarming scene is the vast ocean, with the sun setting in the distance, casting a warm glow over the entire landscape. The image beautifully encapsulates the bond between the woman and her dog, set against the tranquil beauty of nature."
     ]
    }
   ],
   "source": [
    "stream_complete_response = dashscope_multi_modal_llm.stream_complete(\n",
    "    prompt=\"What's in the image?\",\n",
    "    image_documents=image_documents,\n",
    ")\n",
    "\n",
    "for r in stream_complete_response:\n",
    "    print(r.delta, end=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6cc9d04",
   "metadata": {},
   "source": [
    "### multi round conversation with chat messages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "555bb503",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The image shows two photos of a panda sitting on a wooden log in an enclosure. In the top photo, the panda is sitting upright with its front paws on the log, facing three crows that are perched on the log. The panda looks alert and curious, while the crows seem to be observing the panda. In the bottom photo, the panda is lying down on the log, its head resting on its front paws. One crow has landed on the ground next to the log, and it seems to be interacting with the panda. The background of the photo shows green plants and a wire fence, creating a natural and relaxed atmosphere.\n",
      "The woman is sitting on the beach with her dog, and they are giving each other high fives. The panda and the crows are sitting together on a log, and the panda seems to be communicating with the crows.\n"
     ]
    }
   ],
   "source": [
    "from llama_index.core.llms.types import MessageRole\n",
    "from llama_index.multi_modal_llms.dashscope_utils import (\n",
    "    create_dashscope_multi_modal_chat_message,\n",
    ")\n",
    "\n",
    "chat_message_user_1 = create_dashscope_multi_modal_chat_message(\n",
    "    \"What's in the image?\", MessageRole.USER, image_documents\n",
    ")\n",
    "chat_response = dashscope_multi_modal_llm.chat([chat_message_user_1])\n",
    "print(chat_response.message.content[0][\"text\"])\n",
    "chat_message_assistent_1 = create_dashscope_multi_modal_chat_message(\n",
    "    chat_response.message.content[0][\"text\"], MessageRole.ASSISTANT, None\n",
    ")\n",
    "chat_message_user_2 = create_dashscope_multi_modal_chat_message(\n",
    "    \"what are they doing?\", MessageRole.USER, None\n",
    ")\n",
    "chat_response = dashscope_multi_modal_llm.chat(\n",
    "    [chat_message_user_1, chat_message_assistent_1, chat_message_user_2]\n",
    ")\n",
    "print(chat_response.message.content[0][\"text\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa8b719f",
   "metadata": {},
   "source": [
    "### Stream Chat through a list of chat messages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c23d7140",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The woman is sitting on the beach, holding a treat in her hand, while the dog is sitting next to her, taking the treat from her hand."
     ]
    }
   ],
   "source": [
    "stream_chat_response = dashscope_multi_modal_llm.stream_chat(\n",
    "    [chat_message_user_1, chat_message_assistent_1, chat_message_user_2]\n",
    ")\n",
    "for r in stream_chat_response:\n",
    "    print(r.delta, end=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c8738293",
   "metadata": {},
   "source": [
    "###  Use images from local files\n",
    " Use local file:  \n",
    "    Linux&mac file schema: file:///home/images/test.png  \n",
    "    Windows file schema: file://D:/images/abc.png  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5e91ec1b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There is a dog in Picture 1, and there is a panda in Picture 2.\n"
     ]
    }
   ],
   "source": [
    "from llama_index.multi_modal_llms.dashscope_utils import load_local_images\n",
    "\n",
    "local_images = [\n",
    "    \"file://THE_FILE_PATH1\",\n",
    "    \"file://THE_FILE_PATH2\",\n",
    "]\n",
    "\n",
    "image_documents = load_local_images(local_images)\n",
    "chat_message_local = create_dashscope_multi_modal_chat_message(\n",
    "    \"What animals are in the pictures?\", MessageRole.USER, image_documents\n",
    ")\n",
    "chat_response = dashscope_multi_modal_llm.chat([chat_message_local])\n",
    "print(chat_response.message.content[0][\"text\"])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
